An Evaluation of Discretization Methods for Learning Rules from Biomedical Datasets
نویسندگان
چکیده
Rule learning has the major advantage of understandability by human experts when performing knowledge discovery within the biomedical domain. Many rule learning algorithms require discrete data in order to learn the IF-THEN rule sets. This requirement makes the selection of a discretization technique an important step in rule learning. We compare the performance of one standard technique, Fayyad and Irani’s Minimum Description Length Principle Criterion, which is the defacto discretization method in many machine learning packages, to that of a new Efficient Bayesian Discretization (EBD) method and show that EBD leads to significant gains in performance especially as the complexity of the rule learner increases.
منابع مشابه
An Evolutionary Multi-objective Discretization based on Normalized Cut
Learning models and related results depend on the quality of the input data. If raw data is not properly cleaned and structured, the results are tending to be incorrect. Therefore, discretization as one of the preprocessing techniques plays an important role in learning processes. The most important challenge in the discretization process is to reduce the number of features’ values. This operat...
متن کاملAnalyzing Data Clusters: A Rough Sets Approach to Extract Cluster-Defining Symbolic Rules
In this paper we present a strategy together with its computational implementation to intelligently analyze data clusters in terms of symbolic cluster-defining rules. We present a symbolic rule extraction workbench that leverages rough set theory to inductively extract CNF form symbolic rules from un-annotated continuous-valued data-vectors. Our workbench purports a hybrid rule extraction metho...
متن کاملA hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کاملHybrid System based on Rough Sets and Genetic Algorithms for Medical Data Classifications
Computational intelligence provides the biomedical domain by a significant support. The application of machine learning techniques in medical applications have been evolved from the physician needs. Screening, medical images, pattern classification, prognosis are some examples of health care support systems. Typically medical data has its own characteristics such as huge in size and features, c...
متن کاملExperimental Evaluation of Discretization Schemes for Rule Induction
This paper proposes an experimental evaluation of various discretization schemes in three different evolutionary systems for inductive concept learning. The various discretization methods are used in order to obtain a number of discretization intervals, which represent the basis for the methods adopted by the systems for dealing with numerical values. Basically, for each rule and attribute, one...
متن کامل